Goto

Collaborating Authors

 hyperparameter tuning


New Bounds for Hyperparameter Tuning of Regression Problems Across Instances

Neural Information Processing Systems

The task of tuning regularization coefficients in regularized regression models with provable guarantees across problem instances still poses a significant challenge in the literature. This paper investigates the sample complexity of tuning regularization parameters in linear and logistic regressions under \ell_1 and \ell_2 -constraints in the data-driven setting. For the linear regression problem, by more carefully exploiting the structure of the dual function class, we provide a new upper bound for the pseudo-dimension of the validation loss function class, which significantly improves the best-known results on the problem. Remarkably, we also instantiate the first matching lower bound, proving our results are tight. For tuning the regularization parameters of logistic regression, we introduce a new approach to studying the learning guarantee via an approximation of the validation loss function class.


New Bounds for Hyperparameter Tuning of Regression Problems Across Instances

Neural Information Processing Systems

The task of tuning regularization coefficients in regularized regression models with provable guarantees across problem instances still poses a significant challenge in the literature. This paper investigates the sample complexity of tuning regularization parameters in linear and logistic regressions under \ell_1 and \ell_2 -constraints in the data-driven setting. For the linear regression problem, by more carefully exploiting the structure of the dual function class, we provide a new upper bound for the pseudo-dimension of the validation loss function class, which significantly improves the best-known results on the problem. Remarkably, we also instantiate the first matching lower bound, proving our results are tight. For tuning the regularization parameters of logistic regression, we introduce a new approach to studying the learning guarantee via an approximation of the validation loss function class.


Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application

Chen, Keyu, Fei, Cheng, Bi, Ziqian, Liu, Junyu, Peng, Benji, Zhang, Sen, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Yin, Caitlyn Heqi, Zhang, Yichao, Feng, Pohsun, Wen, Yizhu, Wang, Tianyang, Li, Ming, Ren, Jintao, Niu, Qian, Chen, Silin, Hsieh, Weiche, Yan, Lawrence K. Q., Liang, Chia Xin, Xu, Han, Tseng, Hong-Ming, Song, Xinyuan, Liu, Ming

arXiv.org Artificial Intelligence

With a focus on natural language processing (NLP) and the role of large language models (LLMs), we explore the intersection of machine learning, deep learning, and artificial intelligence. As artificial intelligence continues to revolutionize fields from healthcare to finance, NLP techniques such as tokenization, text classification, and entity recognition are essential for processing and understanding human language. This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models. Additionally, it highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness. By addressing key aspects of data processing and model fine-tuning, this work aims to provide insights into deploying effective and ethically sound AI solutions.


GRUvader: Sentiment-Informed Stock Market Prediction

Mamillapalli, Akhila, Ogunleye, Bayode, Inacio, Sonia Timoteo, Shobayo, Olamilekan

arXiv.org Artificial Intelligence

Stock price prediction is challenging due to global economic instability, high volatility, and the complexity of financial markets. Hence, this study compared several machine learning algorithms for stock market prediction and further examined the influence of a sentiment analysis indicator on the prediction of stock prices. Our results were two-fold. Firstly, we used a lexicon-based sentiment analysis approach to identify sentiment features, thus evidencing the correlation between the sentiment indicator and stock price movement. Secondly, we proposed the use of GRUvader, an optimal gated recurrent unit network, for stock market prediction. Our findings suggest that stand-alone models struggled compared with AI-enhanced models. Thus, our paper makes further recommendations on latter systems.


Hyperparameter Tuning Through Pessimistic Bilevel Optimization

Ustun, Meltem Apaydin, Xu, Liang, Zeng, Bo, Qian, Xiaoning

arXiv.org Artificial Intelligence

Automated hyperparameter search in machine learning, especially for deep learning models, is typically formulated as a bilevel optimization problem, with hyperparameter values determined by the upper level and the model learning achieved by the lower-level problem. Most of the existing bilevel optimization solutions either assume the uniqueness of the optimal training model given hyperparameters or adopt an optimistic view when the non-uniqueness issue emerges. Potential model uncertainty may arise when training complex models with limited data, especially when the uniqueness assumption is violated. Thus, the suitability of the optimistic view underlying current bilevel hyperparameter optimization solutions is questionable. In this paper, we propose pessimistic bilevel hyperparameter optimization to assure appropriate outer-level hyperparameters to better generalize the inner-level learned models, by explicitly incorporating potential uncertainty of the inner-level solution set. To solve the resulting computationally challenging pessimistic bilevel optimization problem, we develop a novel relaxation-based approximation method. It derives pessimistic solutions with more robust prediction models. In our empirical studies of automated hyperparameter search for binary linear classifiers, pessimistic solutions have demonstrated better prediction performances than optimistic counterparts when we have limited training data or perturbed testing data, showing the necessity of considering pessimistic solutions besides existing optimistic ones.


Randomized-Grid Search for Hyperparameter Tuning in Decision Tree Model to Improve Performance of Cardiovascular Disease Classification

Pathak, Abhay Kumar, Chaubey, Mrityunjay, Gupta, Manjari

arXiv.org Artificial Intelligence

Cardiovascular disease refers to any critical condition that impacts the heart. Because heart diseases can be life-threatening. Researchers are focusing on designing smart systems to accurately diagnose them based on electronic health data, with the aid of machine learning algorithms. Heart disease classification using machine learning (ML) algorithms such as Support Vector Machine(SVM), Na\"ive Bayes(NB), Decision Trees (DTs) and Random Forests (RFs) are often hindered by overfitting. These ML algorithms need extensive hyperparameter tuning. Random Search offers a faster, and, more efficient exploration of hyperparameter space, but, it may overlook optimal regions. Grid Search, though exhaustive, but, it is computationally expensive and inefficient, particularly with high-dimensional data. To address these limitations, Randomized-Grid Search, a novel hybrid optimization method is proposed that combines the global exploration strengths of Random Search with the focused, and, exhaustive search of Grid Search in the most promising regions. This hybrid approach efficiently balances exploration and exploitation. The proposed model optimizes the hyperparameter for Decision Tree model. The proposed model is applied to UCI heart disease dataset for classification. It enhances model performance, provides improved accuracy, generalization, and computational efficiency. Experimental results demonstrate that Randomized-Grid Search outperforms traditional methods by significant margins. The proposed model provides a more effective solution for machine learning applications in healthcare diagnosis.


Hyperparameter Tuning is All You Need for LISTA

Neural Information Processing Systems

Learned Iterative Shrinkage-Thresholding Algorithm (LISTA) introduces the concept of unrolling an iterative algorithm and training it like a neural network. It has had great success on sparse recovery. In this paper, we show that adding momentum to intermediate variables in the LISTA network achieves a better convergence rate and, in particular, the network with instance-optimal parameters is superlinearly convergent. Moreover, our new theoretical results lead to a practical approach of automatically and adaptively calculating the parameters of a LISTA network layer based on its previous layers. Perhaps most surprisingly, such an adaptive-parameter procedure reduces the training of LISTA to tuning only three hyperparameters from data: a new record set in the context of the recent advances on trimming down LISTA complexity.


Optimizing Emotion Recognition with Wearable Sensor Data: Unveiling Patterns in Body Movements and Heart Rate through Random Forest Hyperparameter Tuning

Nur, Zikri Kholifah, Wijaya, Rifki, Wulandari, Gia Septiana

arXiv.org Artificial Intelligence

This research delves into the utilization of smartwatch sensor data and heart rate monitoring to discern individual emotions based on body movement and heart rate. Emotions play a pivotal role in human life, influencing mental well-being, quality of life, and even physical and physiological responses. The data were sourced from prior research by Juan C. Quiroz, PhD. The study enlisted 50 participants who donned smartwatches and heart rate monitors while completing a 250-meter walk. Emotions were induced through both audio-visual and audio stimuli, with participants' emotional states evaluated using the PANAS questionnaire. The study scrutinized three scenarios: viewing a movie before walking, listening to music before walking, and listening to music while walking. Personal baselines were established using DummyClassifier with the 'most_frequent' strategy from the sklearn library, and various models, including Logistic Regression and Random Forest, were employed to gauge the impacts of these activities. Notably, a novel approach was undertaken by incorporating hyperparameter tuning to the Random Forest model using RandomizedSearchCV. The outcomes showcased substantial enhancements with hyperparameter tuning in the Random Forest model, yielding mean accuracies of 86.63% for happy vs. sad and 76.33% for happy vs. neutral vs. sad.


Hyperparameters in Continual Learning: a Reality Check

Cha, Sungmin, Cho, Kyunghyun

arXiv.org Artificial Intelligence

Various algorithms for continual learning (CL) have been designed with the goal of effectively alleviating the trade-off between stability and plasticity during the CL process. To achieve this goal, tuning appropriate hyperparameters for each algorithm is essential. As an evaluation protocol, it has been common practice to train a CL algorithm using diverse hyperparameter values on a CL scenario constructed with a benchmark dataset. Subsequently, the best performance attained with the optimal hyperparameter value serves as the criterion for evaluating the CL algorithm. In this paper, we contend that this evaluation protocol is not only impractical but also incapable of effectively assessing the CL capability of a CL algorithm. Returning to the fundamental principles of model evaluation in machine learning, we propose an evaluation protocol that involves Hyperparameter Tuning and Evaluation phases. Those phases consist of different datasets but share the same CL scenario. In the Hyperparameter Tuning phase, each algorithm is iteratively trained with different hyperparameter values to find the optimal hyperparameter values. Subsequently, in the Evaluation phase, the optimal hyperparameter values is directly applied for training each algorithm, and their performance in the Evaluation phase serves as the criterion for evaluating them. Through experiments on CIFAR-100 and ImageNet-100 based on the proposed protocol in class-incremental learning, we not only observed that the existing evaluation method fail to properly assess the CL capability of each algorithm but also observe that some recently proposed state-of-the-art algorithms, which reported superior performance, actually exhibit inferior performance compared to the previous algorithm.


FedPop: Federated Population-based Hyperparameter Tuning

Chen, Haokun, Krompass, Denis, Gu, Jindong, Tresp, Volker

arXiv.org Artificial Intelligence

Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both client and server sides. Compared with prior tuning methods, FedPop employs an online "tuning-while-training" framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP tuning methods for FL.